An Efficient Parallel Algorithm for High Dimensional Similarity Join - Parallel Processing Symposium, 1998, and Symposium on Parallel and Distributed Processing 1998. 19
نویسندگان
چکیده
Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the E-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for highskew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Furthel; its cost is proportional to the overall cost of the similarity join.
منابع مشابه
An Efficient RMS Admission Control and its Application to Multiprocessor Scheduling - Parallel Processing Symposium, 1998, and Symposium on Parallel and Distributed Processing 1998. 19
A real-time system must execute functionally correct computations in a timely mannel: In order to guarantee that all tasks accepted in the system will meet their timing requirements, an admission control algorithm must be used to only accept tasks whose requirements can be satisfied. Rate-monotonic scheduling (RMS) is arguably the best known scheduling policy for periodic real-time tasks on uni...
متن کاملExperimental Validation of Parallel Computation Models on the Intel Paragon - Parallel Processing Symposium, 1998, and Symposium on Parallel and Distributed Processing 1998. 19
Experimental data validating some of the proposed parallel computation models on the Intel Paragon is presented. This architecture is characterized by a large bandwidth and a relatively large startup cost of a message transmission, which makes it extremely important to employ bulk transfers. The models considered are the BSP model, in which it is assumed that all messages have a fixed short siz...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کامل